class: center, middle, inverse, title-slide .title[ # Phase II: Using Our Toolbox ] .subtitle[ ## Module 6: Spatial Awareness ] .author[ ### Dr.ย Christopher Kenaley ] .institute[ ### Boston College ] .date[ ### 2025/10/31 ] --- class: top # In class today <!-- Add icon library --> <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.14.0/css/all.min.css"> .pull-left[ Today we'll .... - Intro to random forest models ] .pull-right[ <img src="https://dfzljdn9uc3pi.cloudfront.net/2020/8262/1/fig-1-2x.jpg" width="450"> ] --- # What is a Random Forest? .pull-left[ - **Ensemble method** built from many decision trees - Each tree trained on a **bootstrap sample** of the data - Predictions are averaged (regression) or voted (classification) - Reduces overfitting and improves generalization ] .pull-right[  ] --- # ๐ฒ What is a Random Forest? .pull-left[ ### Think of it like a team of decision trees! - Each **tree** makes a prediction (like a vote). - Trees are trained on **different random subsets** of the data. - Each tree also sees **different random sets of variables**. - The forestโs final prediction is the **average (regression)** or **majority vote (classification)** across all trees. ๐ก The idea: > Many weak but diverse trees together make a strong, stable model. ] .pull-right[  .small[Each tree sees a different sample of data and features.] ] --- # ๐ฒ Classification vs. Regression Forests .pull-left[ ### **Classification Forest** - Used when the **response variable is a category** (e.g., species, gender, habitat type, house color). - Each tree โvotesโ for a class label. - The **majority vote** across all trees = prediction. ๐ **Example** ``` r Species ~ Sepal.Length + Petal.Width ``` ``` ## Species ~ Sepal.Length + Petal.Width ``` ``` r # Predicts setosa, versicolor, or # virginica. ``` ๐ฏ Goal: minimize misclassification error. ] .pull-right[ ### **Regression Forest** - Used when the response variable is numeric (e.g., body mass, growth rate, temperature). - Each tree predicts a number. - The average across all trees = prediction. ๐ Example ``` r Sepal.Width ~ Sepal.Length + Petal.Width ``` ``` ## Sepal.Width ~ Sepal.Length + Petal.Width ``` ๐ฏ Goal: minimize mean squared error (MSE). ] --- # Installing and loading ``` r # install.packages('randomForest') library(randomForest) # Example dataset: data(iris) head(iris) ``` ``` ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## 1 5.1 3.5 1.4 0.2 setosa ## 2 4.9 3.0 1.4 0.2 setosa ## 3 4.7 3.2 1.3 0.2 setosa ## 4 4.6 3.1 1.5 0.2 setosa ## 5 5.0 3.6 1.4 0.2 setosa ## 6 5.4 3.9 1.7 0.4 setosa ``` --- # โ๏ธ Fitting a Random Forest in R ## ๐ง Steps .pull-left[ ### 1. Fit model ``` r set.seed(123) #for reproducibility rf_model <- randomForest( Species ~ ., # response ~ predictors data = iris, ntree = 500, # number of trees mtry = 2, # predictors per split importance = TRUE # compute variable importance ) ``` ] .pull-right[ ### What these do - `ntree` โ more trees = smoother, more stable forest - `mtry` โ how many predictors are tested at each split - `importance` โ tracks which variables matter most ] --- # โ๏ธ Fitting a Random Forest in R .pull-left[ ### 2. Check the model output ``` r rf_model ``` ``` ## ## Call: ## randomForest(formula = Species ~ ., data = iris, ntree = 500, mtry = 2, importance = TRUE) ## Type of random forest: classification ## Number of trees: 500 ## No. of variables tried at each split: 2 ## ## OOB estimate of error rate: 4.67% ## Confusion matrix: ## setosa versicolor virginica class.error ## setosa 50 0 0 0.00 ## versicolor 0 47 3 0.06 ## virginica 0 4 46 0.08 ``` ] .pull-right[ ### What these do - check model output ] --- # โ๏ธ Fitting a Random Forest in R .pull-left[ ### 2. Check the model output ``` r plot(rf_model) ``` <!-- --> ] .pull-right[ ### What these do - `plot()` โ shows error rate vs. number of trees ] --- # ๐ What Variables are Important? .pull-left[ ### ๐ง The big idea - A **Random Forest** learns from many predictors (features). - But not all predictors help equally! - **Variable importance** shows which ones matter most for making good predictions. ๐ก Imagine asking: > โIf I hide this variable, does the forest get worse at predicting?โ If accuracy drops a lot โ that variable is **important**. If accuracy stays the same โ that variable probably isnโt helping much. ] .pull-right[ ### ๐ Assessing Importances ``` r # Show how important each variable is importance(rf_model) ``` ``` ## setosa versicolor virginica MeanDecreaseAccuracy ## Sepal.Length 6.495113 7.588736 7.690949 11.098105 ## Sepal.Width 4.364415 1.038790 4.546158 5.144762 ## Petal.Length 22.157039 34.060864 27.853055 33.535680 ## Petal.Width 22.482214 32.571528 30.595022 32.839577 ## MeanDecreaseGini ## Sepal.Length 9.798189 ## Sepal.Width 2.236535 ## Petal.Length 43.093269 ## Petal.Width 44.042372 ``` --- # ๐ What Variables are Important? .pull-left[ ### ๐ง The big idea - A **Random Forest** learns from many predictors (features). - But not all predictors help equally! - **Variable importance** shows which ones matter most for making good predictions. ๐ก Imagine asking: > โIf I hide this variable, does the forest get worse at predicting?โ If accuracy drops a lot โ that variable is **important**. If accuracy stays the same โ that variable probably isnโt helping much. ] .pull-right[ ### ๐ Assessing Importances ``` r # Make a quick plot varImpPlot(rf_model) ``` <!-- --> .small[ longer bars `=` variables that help the model the most. shorter bars `=` variables with little impact on predictions. ] ] --- # ๐ฏ Making Predictions with a Random Forest .pull-left[ ### ๐ง The idea Once your forest is trained, you can use it to **predict new outcomes**! - The model has โlearnedโ patterns from training data. - You give it new observations (with predictor values). - Each tree makes a prediction โ the forest combines them for a final answer. ๐ณ Many trees โ one decision! ] .pull-right[ ### ๐ In R ``` r # Make predictions new_obs <- iris[1:5, -5] # drop Species column predict(rf_model, new_obs) ``` ``` ## 1 2 3 4 5 ## setosa setosa setosa setosa setosa ## Levels: setosa versicolor virginica ``` ๐งฉ For classification: Each tree votes for a class. The forest picks the majority vote. ๐ For regression: Each tree predicts a number. The forest takes the average. ] --- # ๐งฉ Comparing Predictions to Actual Classes .pull-left[ ### ๐ฏ Why compare? - We want to see **how often the model gets it right!** - Comparing predicted and actual classes tells us how accurate our Random Forest is. - A **confusion matrix** summarizes correct vs. incorrect predictions. ๐ก The more predictions on the diagonal โ the better the model! ] .pull-right[ ### Compare predictions to actual ``` r pred <- predict(rf_model, iris) # Confusion matrix table(Predicted = pred, Actual = iris$Species) ``` ``` ## Actual ## Predicted setosa versicolor virginica ## setosa 50 0 0 ## versicolor 0 50 0 ## virginica 0 0 50 ``` ``` r # Model accuracy mean(pred == iris$Species) ``` ``` ## [1] 1 ``` .small[ The confusion matrix shows where the model is right or wrong. Accuracy = proportion of correct predictions. ] ]